Introduction

The purpose of this report is to verify known issues that occur during the estimation of the covariance matrix in multivariate statistics, specifically that of estimates from the multivariate normal distribution. This issues are known to arise and be exacerbated when the dimensionality (p) of the covariance matrix to be estimated is large. We also investigate whether these issues can be alleviated by switching from the standard (maximum likelihood) estimator to a diagonalised version and/or a shrinkage estimator implemented by Ledoit and Wolf (2004). This is motivated by the importance of the covariance matrix in various statistical procedures and techniques, as well as being an important estimate (often seen in the form of the scaled correlation matrix) in it’s own right. For example, the covariance matrix is used in Principal Components Analysis (“PCA”) (Pearson, 1901), data whitening (Kessy, Lewin & Strimmer, 2018) and the Karhunen-Loeve Transform (Dony, 2001, pg 20-55). These techniques themselves are ubiquitous in statistics and other fields, finding use in regression, ranking, dimensionality reduction, clustering (Reris & Brooks, 2015), spherical K-means (Coates & Ng, 2012) and efficient image compression. Fields where covariance estimation is of direct relevance include finance (Bai & Shi, 2011) and genomics (Serra, Corretto, Fratello & Tagliaferri, 2017)

Method

Briefly, the population covariance matrix is a matrix of the variances between a set of random variables, expressed as;

\[ \Sigma = \begin{pmatrix} var(X_1) & cov(X_2,X_1) & cov(X_3,X_1) \\ cov(X_2,X_1) & var(X_2) & cov(X_3,X_1) \\ cov(X_3,X_1) & cov(X_2,X_3) & var(X_3) \end{pmatrix} \]

Where \(cov(X_i,X_j) = E[(X_i - \mu_{X_i})(X_j - \mu_{X_j})]\) and \(var(X_i) = E[(X_i - \mu_{X_i})^2]\). The diagonal entries are \(cov(X_i,X_i)\) and so reduce down to \(var(X_i)\).

We consider the situation where \(X_1,X_2,...,X_n \stackrel{i.i.d}{\sim} \mathcal{N}(\mu,\Sigma)\) is a sample of size \(n\) from a multivariate normal distribution over \(\mathbb{R}^p\).

In order generate samples from a multivariate normal distribution, we must specify the parameters of the distribution. Here, we specify that \(\mu = \mathbb{0}\) (i.e. that the mean vector is zero in each element) and we specify four types of covariance structures: the identity covariance matrix, a covariance matrix with linear decaying diagonals, a covariance matrix with exponentially decaying and one form of toeplitz matrix as a covariance matrix. All four of these matrices meet the requirements that the matrix is symmetric i.e \(A = A^T\) and positive definite i.e. none of the eigenvalues are negative. This is known, as the eigenvalues for a diagonal matrix are simply it’s diagonal elements and for the toeplitz matrix, all the diagonal entries are positive and the sum of off-diagonal entries for that row are less than the diagonal entry for that row.

For each covariance structure, we generate samples of a size \(n\) dependent on the dimensionality . So sample sizes range over \(n = \{p, p \log(p), 5p \log(p), 10p \log(p)\}\) and the dimensionalities (read: matrix width) of the samples range over \(p = \{5,10,50,100,250\}\). For each combination of covariance structure, sample size and dimensionality, we repeat the simulation 20 times, using replicate(). This means we generate 5 (sample sizes) x 4 (dimensionalities) x 4 (covariance structures) * 20 (replications/simulations). Hence, we generate 1,600 matrices.

Below is the enumerated combinations of sample sizes we are exploring;

Sample Sizes
p=5 p=10 p=50 p=100 p=250
n=p 5 10 50 100 250
n=p*log(p) 9 24 196 461 1381
n=5plog(p) 41 116 979 2303 6902
n=10plog(p) 81 231 1957 4606 13804

Matrix Generation

Identity

\[\Sigma_1 := \mathbb{I}_p\] As above, the identity covariance matrix is simply the identity matrix (of dimensionality p) and so has 1 down the entries of the diagonal and 0 on the off-diagonal entries.

Example Identity Covariance Matrix Structure (p = 3)
1 0 0
0 1 0
0 0 1

Linear

\[\Sigma_2 := \Sigma_2(i,j) = 0 \ if \ i \neq j \ and \ \Sigma_2(i,i) = p + 1 - i\] As above, this is a diagonally dominant symmetric matrix with 0s on the off-diagonal entries and a linearly decreasing sequence in unit increments in the diagonals.

Example Linearly Decaying Covariance Matrix Structure (p = 3)
3 0 0
0 2 0
0 0 1

Exponential

\[ \Sigma_3 := \Sigma_23i,j) = 0 \ if \ i \neq j \ and \ \Sigma_3(i,i) = p^{(2-i)}\] As above, this is a symmetric matrix with 0s on the off-diagonals. The diagonals are an exponentially decreasing sequence.

Example Exponentially Decaying Covariance Matrix Structure (p = 3)
3 0 0.00
0 1 0.00
0 0 0.33

Toeplitz

\[\Sigma_4 := \Sigma_4(i,j) = 0.5^{|i - j|}\] As above, the Toeplitz matrix is diagonally dominant and symmetric. In contrast, it has constant diagonals and the covariances (off diagonals) scale according to the similarity of the matrix indices (such that \(cov(X_1,X_2)\) will be higher than \(cov(X_1,X_5)\)).

Example Toeplitz Covariance Matrix Structure (p = 3)
1.00 0.5 0.25
0.50 1.0 0.50
0.25 0.5 1.00

R Markdown requires recompilation without caching. Caching such large vectors is not currently feasible, and reading in the original samples is also not feasible without special methods (e.g. the R feather package for lazy loading). Hence, we skip generation of the sample matrices in this document and proceed to directly read in the pre-generated result covariance estimates.

Covariance estimation

The objective now is to investigate the estimation of the population covariance structure of the independent variables from which these matrices are sampled. The population covariance matrix structure is known. Here, we use three types of estimators: maximum likelihood with known mean, a diagonal estimator of only variances, and the Ledoit-Wolf shrinkage estimator.

We define these estimators and run each of them across the 1,600 matrices representing samples from the multivariate normal distribution, producing 4,400 covariance matrix estimates.

The Maximum-Likelihood Estimator

\[\hat{\Sigma} := \frac{1}{n} \sum\limits_{n=1}^{n} (x_i-\hat{\mu})(x_i-\hat{\mu})^T \ where \ \hat{\mu} := \frac{1}{n}\sum\limits_{n=1}^{n} x_i\] This is also known as the Sample Covariance Matrix. As we know the population mean vector, we substitute this into the equation, simplifying it;

\[\hat{\Sigma}_{\mu = 0} := \frac{1}{n} \sum\limits_{n=1}^{n} x_ix_i^T \]

We see that the maximum likelihood estimate of covariance (using our function, ml_cov) is very similar to the inbuilt R estimate using cov() or var() (which would, in contrast, use an estimate of the mean vector, when supplied with a matrix).

Example Covariance Estimate using ml_cov()
1.21 0.03 0.15 -0.67 0.10
0.03 0.85 -0.20 0.28 -0.25
0.15 -0.20 1.34 0.39 0.76
-0.67 0.28 0.39 0.88 0.00
0.10 -0.25 0.76 0.00 1.44
Example Covariance Estimate using ml_cov()
1.46 -0.09 0.41 -0.82 0.37
-0.09 0.74 0.26 0.40 0.26
0.41 0.26 0.85 0.42 0.05
-0.82 0.40 0.42 1.10 -0.07
0.37 0.26 0.05 -0.07 0.81

The Ledoit-Wolf Estimator

This is a shrinkage estimator, which shrinks the extreme entries in the covariance estimator towards the center (Ledoit & Wolf, 2003). I found the functions cov_shrink() and linshrink_cov() from the corpcor and nlshrink respectively both deviated from the results of the MATLAB implementation (when run in MATLAB, and then translated to R) on a test matrix.

# We check that the package is doing things correctly by translating the matlab code;
LedoitWolf_cov_estimate <- function(x,shrink,return_shrink = F) {
  # demean matrix
  size_x <- dim(x)
  t <- size_x[1] 
  n <- size_x[2] 
  meanx <- apply(FUN = mean,x,MARGIN = 2)
  x <- x-t(replicate(nrow(x),apply(FUN = mean,x,MARGIN = 2))) # as per matlab code
  
  # sample covariance matrix - assuming mean vector = 0 ???
  sample_cov <- (1/t)*(t(x) %*% x)
  #print(sample_cov) #debugging
  
  # compute prior
  mean_var <- mean(diag(sample_cov)) # mean of diagonal entries of sample covariance matrix
  prior <- mean_var * diag(1,n)
    
    # compute shrinkage parameters if none provided
    if (missing(shrink)) {
    # what they call p
    y <- x^2
    phiMat <- t(y) %*% y/t-sample_cov^2
    phi <- sum(phiMat)
    # what they call c
    gamma <- norm(sample_cov - prior,type = "F")^2 # Frobenius Norm
    
    # compute shrinkage constant
    kappa <- phi/gamma
    shrink <- max(0,min(1,kappa/t))
        
    # checking atomicity and length is the standard scalar check in R, as far as I am aware, believe it or not
    } else if (length(shrink) != 1 || !is.atomic(shrink)) { # nargin = number of function input arguments
        stop("shrink must be atomic and of length 1")
    }
  
  sigma <- shrink * prior+(1-shrink) * sample_cov
  if (!return_shrink) {
    return(sigma)
  } else {
    return(list('sigma' = sigma,'shrinkage' = shrink))
    #beep() #  )': no ding
  }

}

# checking package functions against custom function translated and tested against matlab code
x <- t(matrix(1:9,nrow = 3)) # test matrix 
setNames(as.data.frame(linshrink_cov(x)),c("","","")) %>% round(2) %>% kable() %>% kable_styling() %>% add_header_above(header = c("Example Covariance Estimate using linshrink_cov()" = 3)) # differ from matlab
Example Covariance Estimate using linshrink_cov()
9.00 5.62 5.62
5.62 9.00 5.62
5.62 5.62 9.00
## Estimating optimal shrinkage intensity lambda.var (variance vector): 1 
## 
## Estimating optimal shrinkage intensity lambda (correlation matrix): 0.25
Example Covariance Estimate using cov_shrink()
9.00 6.75 6.75
6.75 9.00 6.75
6.75 6.75 9.00
## Estimating optimal shrinkage intensity lambda.var (variance vector): 1 
## 
## Estimating optimal shrinkage intensity lambda (correlation matrix): 0.25
## $dim
## [1] 3 3
## 
## $lambda
## [1] 0.25
## 
## $lambda.estimated
## [1] TRUE
## 
## $class
## [1] "shrinkage"
## 
## $lambda.var
## [1] 1
## 
## $lambda.var.estimated
## [1] TRUE
Example Covariance Estimate using our function
6.0 4.5 4.5
4.5 6.0 4.5
4.5 4.5 6.0
## [1] 0.25

The Diagonalised Maximum-Likelihood Estimator

This is simply the diagonal entries of the maximum likelihood covariance matrix. This has the advantage of being positive definite and incorporating structure into estimation (by only taking the diagonal entries) of the covariance that is known to be true for three of the four covariance structures. This is the idea of imposing some ad-hoc structure on the covariance matrix such as diagonality (in our case) or a factor model (e.g. as used in finance) (Ledoit & Wolf, 2004). This bears the disadvantage that the model will be misspsecified due to our lack of prior knowledge regarding the covariance structure (such as is the case with using this estimator to estimate a toeplitz covariance structure).

\[\hat{\Sigma}_{DIAG} := diag(\hat{\Sigma}_{ML})\]

Examples

##                mpg cyl disp
## Mazda RX4     21.0   6  160
## Mazda RX4 Wag 21.0   6  160
## Datsun 710    22.8   4  108
##       mpg   cyl  disp
## mpg   467 114.4  3061
## cyl   114  29.3   784
## disp 3061 784.0 20955
##      [,1] [,2]  [,3]
## [1,]  467  0.0     0
## [2,]    0 29.3     0
## [3,]    0  0.0 20955
##        mpg  cyl  disp
## mpg   50.7 -0.6 -15.6
## cyl   -0.6 50.9  17.3
## disp -15.6 17.3 500.9

Due to caching issues, the estimates are pre-generated and read in, but can be generated with the code below.

Assessing Covariance Estimates

In order to assess the covariance estimators, we must have a ‘ground truth’ comparison for some of the criteria we are using. For example, this is necessary to calculate the sum of squared errors. It is also necessary to make comparisons of the eigenvalues and eigenvectors. Hence, we generate the covariance structure associated with each dimensionality below.

Now that we have generated our samples of specified sample size (n) and dimensionality (p), we require a way to assess the quality of our estimates. Hence, we will use five criteria;

  • Eigenvalues
  • Trace
  • Dot product
  • Sum of Square Errors
  • Stability of Estimates

We will focus on comparisons of the criterion for each estimator across the parameter space in n and p. The visualisations can be read downward in a column to compare constant dimensionality with increasing sample size, and left to right row-wise to compare constant sample size with increasing dimensionality.

For each set of plots, we note what is observed.

Eigenvalues

To address this question, we much first take the true eigenvalues of the covariance structures. As demonstrated, these are equal to the diagonal values of the covariance structures for all but the Toeplitz covariance structure. The eigenvalues reflect the variance of the eigenvectors of the covariance matrix. These are plotted below as red lines. The covariance estimates are eigendecomposed and the largest and smallest eigenvalue retrieved.

Identity Covariance Matrix

The Ledoit-Wolf (LW) estimator estimates the smallest covariances more accurately than the maximum likelihood (ML) and diagonalised maximum likelihood (DML). The ML estimator slightly underestimates the smallest eigenvalues and the DML estimator underestimates by quite a lot - at small sample sizes they tend to be zero.

The estimators tend to have more divergent and concentrated estimates of the lowest eigenvalues with high dimensionality, while low dimensional estimates tend to be more diffuse and mixed (compare left-most column to right most). We observe a similar trend occurring with the largest eigenvalues being very close to the true covariance eigenvalues for LW, close for ML and quite divergent for DML. Again, this trend is magnified by increases in dimensionality. The severity of misestimation of the largest covariances tends to lessened as the sample size increases, especially for the DML estimator.

Linearly Decaying Covariance Matrix

We observe the LW estimator misestimates the smallest eigenvalues across n and p, while the DML and ML estimators estimate the smallest eigenvalue reasonably well. For the largest eigenvalues, the ML estimator performs best. The LW estimator performs well where \(n \leq p\). The DML estimator performs worst.

Exponentially Decaying Covariance Matrix

The DML and ML estimator accurately estimate the smallest eigenvalues, whilst the LW introduces a some bias. The is greatest where both n and p are low. For the largest eigenvalues the estimates perform similarly, being reasonably accurate in general.

Toeplitz Covariance Matrix

Here we see a striking divergence in performance. At both low n and p the estimators perform similarly. At high p and low n, the ML estimator under estimates the largest eigenvalues and the DML over estimates the largest eigenvalues. For the smallest eigenvalues, at high n and at high n and high p, the DML and LW estimators perform much better than the ML estimator. Across all combinations, LW estimates the eigenvalues best and the DML estimator performs roughly equal, or better than ML. ML tends to under estimate the largest eigenvalues and over estimate the smallest, whilst DML behaves oppositely.

These observations can be summarised as follows;

Smallest Eigenvalues

Rank Identity Linear Exponential Toeplitz
1st LW ML ML/DML LW
2nd ML DML DML
3rd DML LW LW ML

Largest Eigenvalues

Rank Identity Linear Exponential Toeplitz
1st LW ML ML/DML/LW LW
2nd ML LW DML
3rd DML DML ML

Eigenvalues Dot Plots

# defining common ggproto objects to reduce verbosity 
rug_ <- geom_rug(mapping = aes(x = value, colour = variable),show.legend = F,alpha = 1/2) 
ridges_ <- stat_density_ridges(mapping = aes(x = value,y = variable,fill = variable),color = "black",quantile_lines = T,quantiles = 2)
theme_ridge <- theme_light() + theme(axis.text.x = element_text(angle = 0, hjust = 1,  vjust = -0.5,size = 7),axis.text.y = element_blank(),axis.title = element_text(size = 7)) +  theme(plot.title = element_text(size = 7))
# + guides(color = FALSE) + scale_y_discrete(expand = expand_scale(add = c(0.2, 2.5)))
 
 # ML,D-ML,LW Plotting Function

plot_eigens <- function(x,y,z,start,end,covariance_structure,density = F) {
    
    s <- switch (covariance_structure,
                             "Identity" = 1,
                             "Linear" = 2,
                             "Exponential" = 3,
                             "Toeplitz" = 4
    )
    
    ref <- my_flatten(eigen_range_cov_structures[[s]])
    
    options(digits = 3, scipen = -2)
     eigen_smallest <- list()
     eigen_biggest <- list()
     df_small <- data.frame(matrix(NA, ncol=1, nrow=21)[-1]) # from SO
     df_large <- data.frame(matrix(NA, ncol=1, nrow=21)[-1]) # from SO
     for (i in 1:20) {
        title <- names(my_flatten(x)[start+i])
        title <- str_remove(pattern = str_extract(pattern = "[^.]*.[^.]*",string = title),string = title)
        title <- substr(title,2,nchar(title))
        df_x <- as.data.frame(do.call(rbind, my_flatten(x)[seq(start+i,end,by = 20)]))
        df_y <- as.data.frame(do.call(rbind, my_flatten(y)[seq(start+i,end,by = 20)]))
        df_z <- as.data.frame(do.call(rbind, my_flatten(z)[seq(start+i,end,by = 20)]))
        colnames(df_x) <- c('Smallest_ML',"Largest_ML")
        colnames(df_y) <- c('Smallest_D_ML',"Largest_D_ML")
        colnames(df_z) <- c('Smallest_TP',"Largest_TP")
        quiet(df <- melt(cbind(df_x,df_y,df_z)))
        df_small <- df[(df[,1] == "Smallest_ML" | df[,1] == "Smallest_D_ML" | df[,1] == "Smallest_TP" ),]
        df_large <- df[(df[,1] == "Largest_ML" | df[,1] == "Largest_D_ML" | df[,1] == "Largest_TP" ),]
        v <- as.numeric(ref[[rep(seq(1:5),4)[i]]])
            if (density) {
            
                g <- ggplot(data = df_small) + 
                    ridges_+
                    rug_ +
                    geom_vline(xintercept = v[1],color = 'red',show.legend = F) +
                    labs(title = title,y = "density", x = "smallest eigenvalues") +
                    theme_ridge + guides(color = FALSE) + scale_y_discrete(expand = expand_scale(add = c(0.2, 2.5))) + # scal y fixes tops of ridges being cutoff
                    scale_fill_discrete(name = paste(covariance_structure," Structure |  Estimator: "),labels = c("Maximum Likelihood", "Diagonalised Maximum Likelihood", "Ledoit-Wolf Shrinkage"))
                
                gl <- ggplot(data = df_large) + 
                    ridges_+
                    rug_ +
                    geom_vline(xintercept = v[2],color = 'red',show.legend = F) +
                    labs(title = title,y = "density", x = "largest eigenvalues") +
                    theme_ridge + guides(color = FALSE) + scale_y_discrete(expand = expand_scale(add = c(0.2, 2.5))) +
                    scale_fill_discrete(name = paste(covariance_structure," Structure |  Estimator:"),labels = c("Maximum Likelihood", "Diagonalised Maximum Likelihood", "Ledoit-Wolf Shrinkage"))
            } else if (!density) {
            if (covariance_structure == "Identity") {
                g <- ggplot(data = df_small) + 
                    geom_dotplot(mapping = aes(x = value,color = variable,fill = variable),stackgroups = T,method = "histodot") +
                    geom_vline(xintercept = v[1],color = 'red') +
                    labs(title = title,y = "count", x = "smallest eigenvalues") +
                    theme_light() +
                    theme(axis.text.x = element_text(angle = 0, hjust = 1,  vjust = -0.5,size = 7),axis.text.y = element_text(size = 7),axis.title = element_text(size = 7)) +
                    guides(color = FALSE) +
                    theme(plot.title = element_text(size = 7)) + scale_fill_discrete(name = "Identity Structure | Estimator",labels = c("Maximum Likelihood", "Diagonalised Maximum Likelihood", "Ledoit-Wolf Shrinkage"))
            } else if (covariance_structure == "Exponential") {
                g <- ggplot(data = df_small) + 
                    geom_dotplot(mapping = aes(x = value,color = variable,fill = variable),stackgroups = T,method = "histodot") +
                    geom_vline(xintercept = v[1],color = 'red') +
                    labs(title = title,y = "count", x = "smallest eigenvalues") +
                    theme_light() +
                    theme(plot.title = element_text(size = 7)) +
                    guides(color = FALSE) +
                    theme(axis.text.x = element_text(angle = 0, hjust = 1,  vjust = -0.5,size = 7),axis.text.y = element_text(size = 7),axis.title = element_text(size = 7)) + 
                    scale_fill_discrete(name = "Exponential Structure | Estimator",labels = c("Maximum Likelihood", "Diagonalised Maximum Likelihood", "Ledoit-Wolf Shrinkage"))
            } else {
                g <- ggplot(data = df_small) + 
                    geom_dotplot(mapping = aes(x = value,color = variable,fill = variable),stackgroups = T,method = "histodot") +
                    geom_vline(xintercept = v[1],color = 'red') +
                    labs(title = title,y = "count", x = "smallest eigenvalues") +
                    theme_light() +
                    theme(axis.text.x = element_text(angle = 0, hjust = 1,  vjust = -0.5,size = 7),axis.text.y = element_text(size = 7),axis.title = element_text(size = 7)) +
                    guides(color = FALSE) +
                    theme(plot.title = element_text(size = 7)) + scale_fill_discrete(name = paste(covariance_structure," Structure |  Estimator:"),labels = c("Maximum Likelihood", "Diagonalised Maximum Likelihood", "Ledoit-Wolf Shrinkage"))
            }
            
            if (covariance_structure == "Identity") {
                gl <- ggplot(data = df_large) + 
                    geom_dotplot(mapping = aes(x = value,color = variable,fill = variable),stackgroups = T,method = "histodot") +
                    geom_vline(xintercept = v[2],color = 'red') +
                    labs(title = title,y = "count", x = "largest eigenvalues") +
                    theme_light() +
                    guides(color = FALSE) +
                    theme(axis.text.x = element_text(angle = 0, hjust = 1,  vjust = -0.5,size = 7),axis.text.y = element_text(size = 7),axis.title = element_text(size = 7)) +
                    theme(plot.title = element_text(size = 7)) + 
                    scale_fill_discrete(name = "Identity Structure | Estimator",labels = c("Maximum Likelihood", "Diagonalised Maximum Likelihood", "Ledoit-Wolf Shrinkage"))   # a mad daft hack: put the title in the legend
            } else if (covariance_structure == "Exponential") {
                gl <- ggplot(data = df_large) + 
                    geom_dotplot(mapping = aes(x = value,color = variable,fill = variable),stackgroups = T,method = "histodot") +
                    geom_vline(xintercept = v[2],color = 'red') +
                    labs(title = title,y = "count", x = "largest eigenvalues") +
                    theme_light() +
                    guides(color = FALSE) +
                    theme(plot.title = element_text(size = 7)) +
                    theme(axis.text.x = element_text(angle = 0, hjust = 1,  vjust = -0.5,size = 7),axis.text.y = element_text(size = 7),axis.title = element_text(size = 7)) +
                    scale_fill_discrete(name = "Exponential Structure | Estimator: ",labels = c("Maximum Likelihood", "Diagonalised Maximum Likelihood", "Ledoit-Wolf Shrinkage"))
                    
            } else {
                gl <- ggplot(data = df_large) + 
                    geom_dotplot(mapping = aes(x = value,color = variable,fill = variable),stackgroups = T,method = "histodot") +
                    geom_vline(xintercept = v[2],color = 'red') +
                    labs(title = title,y = "count", x = "largest eigenvalues") +
                    theme_light() +
                    guides(color = FALSE) +
                    theme(axis.text.x = element_text(angle = 0, hjust = 1,  vjust = -0.5,size = 7),axis.text.y = element_text(size = 7),axis.title = element_text(size = 7)) +
                    theme(plot.title = element_text(size = 7)) + scale_fill_discrete(name = paste(covariance_structure," Structure |  Estimator: "),labels = c("Maximum Likelihood", "Diagonalised Maximum Likelihood", "Ledoit-Wolf Shrinkage")) 
                }
            } 
        eigen_smallest[[i]] <- g
        eigen_biggest[[i]] <- gl
     }
     density_ <- NULL;dotplot_ <- NULL
     if (density) { 
        density_ <- "KDEs"
        dotplot_ <- ""
     } else { 
        dotplot_ <- "Dotplots" 
        density_ <- ""
     }
     
     nplots = 20
     ncol = 5
     nrow = 4
     eval(parse(text = paste0("quiet(grob <- grid_arrange_shared_legend(", paste0("eigen_smallest", "[[", c(1:nplots), "]]", sep = '', collapse = ','), ",ncol =", ncol, ",nrow =", nrow, ", position = 'bottom',  
                                                     top=grid::textGrob(paste(dotplot_,density_,'of Smallest Eigenvalues from ',covariance_structure,' Covariance Estimates'), gp=grid::gpar(fontsize=12)),bottom=grid::textGrob('Red lines indicate true population covariance smallest eigenvalues',gp = gpar(fontface = 3, fontsize = 9),hjust = 1,x = 1),plot = T))", sep = '')))
     eval(parse(text = paste0("quiet(grob_large <- grid_arrange_shared_legend(", paste0("eigen_biggest", "[[", c(1:nplots), "]]", sep = '', collapse = ','), ",ncol =", ncol, ",nrow =", nrow, ", position = 'bottom',  
                                                     top=grid::textGrob(paste(dotplot_,density_,'of Largest Eigenvalues from ',covariance_structure,' Covariance Estimates'), gp=grid::gpar(fontsize=12)),bottom=grid::textGrob('Red lines indicate true population covariance largest eigenvalues',gp = gpar(fontface = 3, fontsize = 9),hjust = 1,x = 1),plot = T))", sep = '')))
     

     return(list('small' = df_small,'large' = df_large,'g' = grob,'gl' = grob_large))
}
    # we expect largest and smallest values to be 1

Toeplitz - Dotplots

These matrices are common in analysis of stationary stochastic processes and are common when dealing with psychometric/biometric and time series data (Mukherjee & Maiti, 1988). We see that the largest Eigenvalue tends towards 3 as \(p\) increases, and the smallest tends towards \(\frac{1}{3}\).

True Toeplitz Covariance Matrix Eigenvalues
p=5 p=10 p=50 p=100 p=250
Smallest 0.36 0.34 0.334 0.333 0.333
Largest 2.26 2.68 2.979 2.994 2.999


Eigenvalues Density Plots

Trace of the Covariance Matrix

The trace of a matrix is the sum of it’s eigenvalues; if the matrix is symmetric, it is equal to the sum of the diagonals, which reflects the variance of each individual component of the random vector \(\vec{X} = X_1,X_2,...,X_n\). Therefore, the trace reflects the overall variation present. Again, we wish to see how closely this matches the true total variation present in the covariance matrix structure we are using.

As the DML estimates are simply calculated from the diagonals of the ML estimates, the trace of the ML and DML estimates will be identical. There is a high degree of consistency between the trace estimates from ML/DML and LW. On average it is difficult to say which estimator produces better results, but at high dimensionalities, a small improvement in the LW estimates over the ML/DML estimates can be observed (from the proximity of the black line, the median, to the red line, the true population covariance matrix trace).

As the sample size increases, the variance of the trace estimates decreases for all estimators. As the dimensionality increases, the variance of the trace estimates increases for all estimators.

Trace Density Plots

#---- Q2 Plots ----

# As some of the kernel density estimates overlapped nearly perfectly, decided to use ridgeline plots to clearly show all three estimators

plot_traces <- function(x,y,z,start,end,covariance_structure) {
    options(digits = 3, scipen = -2)

    df_trace <- data.frame(matrix(NA, ncol=1, nrow=21)[-1]) # from SO
    
    s <- switch (covariance_structure,
                    "Identity" = 1,
                    "Linear" = 2,
                    "Exponential" = 3,
                    "Toeplitz" = 4
    )
    
    ref <- my_flatten(trace_cov_structures[[s]])
    traces <- list()
    
    for (i in 1:20) {
        title <- names(my_flatten(x)[start+i])
        title <- str_remove(pattern = str_extract(pattern = "[^.]*.[^.]*",string = title),string = title)
        title <- substr(title,2,nchar(title))
        df_x <- as.data.frame(do.call(rbind, my_flatten(x)[seq(start+i,end,by = 20)]))
        df_y <- as.data.frame(do.call(rbind, my_flatten(y)[seq(start+i,end,by = 20)]))
        df_z <- as.data.frame(do.call(rbind, my_flatten(z)[seq(start+i,end,by = 20)]))
        

        colnames(df_x) <- 'Trace_ML'
        colnames(df_y) <- 'Trace_D_ML'
        colnames(df_z) <- 'Trace_TP'
        v <- as.numeric(ref[[rep(seq(1:5),4)[i]]])
        quiet(df_trace <- melt(cbind(df_x,df_y,df_z)))
            g <- ggplot(data = df_trace) + 
                ridges_+
                geom_vline(xintercept = v,colour = "red",show.legend = F) +
                rug_ +
                labs(title = title,y = "density", x = "trace") +
                theme_ridge + guides(color = FALSE) + scale_y_discrete(expand = expand_scale(add = c(0.2, 2.5))) +
                scale_fill_discrete(name = "Estimator:",labels = c("Maximum Likelihood", "Diagonalised Maximum Likelihood", "Ledoit-Wolf Shrinkage"))
        traces[[i]] <- g
    }

nplots = 20
ncol = 5
nrow = 4
eval(parse(text = paste0("quiet(grob <- grid_arrange_shared_legend(", paste0("traces", "[[", c(1:nplots), "]]", sep = '', collapse = ','), ",ncol =", ncol, ",nrow =", nrow, ", position = 'bottom',  
                                                    top=grid::textGrob(paste('KDEs of Population Trace',covariance_structure,' Covariance Estimates'), gp=grid::gpar(fontsize=12)),bottom=grid::textGrob('Red lines indicate true population covariance trace',gp = gpar(fontface = 3, fontsize = 9),hjust = 1,x = 1),plot = T))", sep = '')))
    
return(list('trace' = df_trace,'g' = grob))
}

Linear - Density

The trace of the linearly decaying covariance structure will simply be the pth triangular number, i.e. \(\sum\limits_{k=1}^{n} k = \frac{n(n + 1)}{2} \ where \ n = p\).

True Linear Covariance Matrix Trace
p=5 p=10 p=50 p=100 p=250
Trace 15 55 1275 5050 31375


Exponential - Density

We can see the trace of the covariance matrix with exponential decay rapidly approximates the size of the dimension within ~1.

True Exponential Covariance Matrix Trace
p=5 p=10 p=50 p=100 p=250
Trace 6.25 11.1 51 101 251


Principal Component of the Covariance Matrix

As the eigenvectors of a symmetric matrix are orthogonal [2] we can take the dot product of the principal component of the true covariance matrix and the principal component of the estimated covariance as measure of how well the eigenvectors are estimated. This is demonstrated below, by summing all possible eigenvector dot products for the toeplitz covariance matrix structure;

## [1] 5

We get 5, which is the dot products of each eigenvector with itself (= 1). Hence, if the principal component of the estimated covariance matrix is off, the other eigenvectors will be as well.

eigen() by default returns the left eigenvectors. As all our covariances structures are symmetric matrices, in principal the left and right eigenvectors of our estimates should be the same [1]. However, due to the asymmetry of the covariance matrix estimates, these will be different to the right eigenvectors.

We check the estimation of the principal components by checking the angle between the true covariance matrix principal eigenvector and the estimated covariance matrix principal eigenvector;

\[\frac{v^T\hat{v}}{\| v \| \| \hat{v} \|}\]

In R the output of eigen() is normalised anyway - so the denominator of the normalised dot product will be one, and can hence be left out of the function. A dot product of zero indicates the vectors are perpendicular (poor estimation) and a dot product of one indicates the vectors overlap (good estimation). Dot products less than zero indicate a vector that is anti-correlated (to a maximum of -1, a vector that is parallel but opposite in direction to the true covariance principal eigenvector).

# Just need to look at first principal component (as if this is wrong, so are others)
eigen_dot_product <- function(x,m) {
        return(eigen(x)$vec[,1] %*% eigen(m)$vec[,1])
    #return(pca(x)$rotation[,1] %*% pca(m)$rotation[,1])
    }

# Normalised Dot Products of Estimates (x) vs Ground Truth (m)
 #mapply(FUN = eigen_dot_product,x = ml_cov_estimates[[i]],m = cov_structures)

identity_cov_struct_flat <- rep(my_flatten(cov_structures[[1]]),80)
linear_cov_struct_flat <- rep(my_flatten(cov_structures[[2]]),80)
exponential_cov_struct_flat <- rep(my_flatten(cov_structures[[3]]),80)
toeplitz_cov_struct_flat <- rep(my_flatten(cov_structures[[4]]),80)

# ML
identity_cov_flat <- my_flatten(ml_cov_estimates[[1]])
dot_products_ml_identity <- map2(identity_cov_flat, identity_cov_struct_flat, eigen_dot_product)
 
linear_cov_flat <- my_flatten(ml_cov_estimates[[2]])
dot_products_ml_linear <- map2(linear_cov_flat, linear_cov_struct_flat, eigen_dot_product)
 
exponential_cov_flat <- my_flatten(ml_cov_estimates[[3]])
dot_products_ml_exponential <- map2(exponential_cov_flat, exponential_cov_struct_flat, eigen_dot_product)
 
toeplitz_cov_flat <- my_flatten(ml_cov_estimates[[4]])
dot_products_ml_toeplitz <- map2(toeplitz_cov_flat, toeplitz_cov_struct_flat, eigen_dot_product)
 
 # Diag ML
identity_cov_flat <- my_flatten(diag_ml_cov_estimates[[1]])
dot_products_diag_ml_identity <- map2(identity_cov_flat, identity_cov_struct_flat, eigen_dot_product)
 
linear_cov_flat <- my_flatten(diag_ml_cov_estimates[[2]])
dot_products_diag_ml_linear <- map2(linear_cov_flat, linear_cov_struct_flat, eigen_dot_product)
 
exponential_cov_flat <- my_flatten(diag_ml_cov_estimates[[3]])
dot_products_diag_ml_exponential <- map2(exponential_cov_flat, exponential_cov_struct_flat, eigen_dot_product)
 
toeplitz_cov_flat <- my_flatten(diag_ml_cov_estimates[[4]])
dot_products_diag_ml_toeplitz <- map2(toeplitz_cov_flat, toeplitz_cov_struct_flat, eigen_dot_product)
 
 # Ledoit-Wolf
identity_cov_flat <- my_flatten(LedoitWolf_cov_estimates[[1]])
dot_products_LedoitWolf_cov_estimates_identity <- map2(identity_cov_flat, identity_cov_struct_flat, eigen_dot_product)
 
linear_cov_flat <- my_flatten(LedoitWolf_cov_estimates[[2]])
linear_cov_struct_flat <- rep(my_flatten(cov_structures[[2]]),80)
dot_products_LedoitWolf_cov_estimates_linear <- map2(linear_cov_flat, linear_cov_struct_flat, eigen_dot_product)
 
exponential_cov_flat <- my_flatten(LedoitWolf_cov_estimates[[3]])
dot_products_LedoitWolf_cov_estimates_exponential <- map2(exponential_cov_flat, exponential_cov_struct_flat, eigen_dot_product)
 
toeplitz_cov_flat <- my_flatten(LedoitWolf_cov_estimates[[4]])
dot_products_LedoitWolf_cov_estimates_toeplitz <- map2(toeplitz_cov_flat, toeplitz_cov_struct_flat, eigen_dot_product)

Identity Covariance Matrix

At low n and p, there is a high degree of misestimation of the principal eigenvectors. As p increases, estimation of the principal eigenvector deteriorates, until the estimated vector and the true vector are consistently perpendicular (dot product of 0). In a minority of cases, the LW estimates corresponds closely with the true eigenvector. The LW estimator performs better with high p (except when \(n \leq p\)), and high n, as opposed to the ML/DML estimators, which perform poorly in all cases.

Linearly Decaying Covariance Matrix

The estimators perform poorly at high p and low n. The ML estimates are quite unstable, as indicated by the bi-modal KDEs

Dotproduct Density Plots

# density plots
# As some of the kernel density estimates overlapped nearly perfectly, decided to use ridgeline plots to clearly show all three estimators
dp_diag_ml <- list('identity' = dot_products_diag_ml_identity,'linear' = dot_products_diag_ml_linear,'exponential' = dot_products_diag_ml_exponential,'toeplitz' = dot_products_diag_ml_toeplitz)
dp_ml <- list('identity' = dot_products_ml_identity,'linear' = dot_products_ml_linear,'exponential' = dot_products_ml_exponential,'toeplitz' = dot_products_ml_toeplitz)
dp_LedoitWolf <- list('identity' = dot_products_LedoitWolf_cov_estimates_identity,'linear' = dot_products_LedoitWolf_cov_estimates_linear,'exponential' = dot_products_LedoitWolf_cov_estimates_exponential,'toeplitz' = dot_products_LedoitWolf_cov_estimates_toeplitz)

plot_dps <- function(x,y,z,start,end,covariance_structure) {
    options(digits = 3, scipen = -2)
    # x is estimator list of eigens
    dps <- list()
    
    df_dp <- data.frame(matrix(NA, ncol=1, nrow=21)[-1]) # from SO
    
    s <- switch (covariance_structure,
                             "Identity" = 1,
                             "Linear" = 2,
                             "Exponential" = 3,
                             "Toeplitz" = 4
    )
    
    dps <- list()
    
    for (i in 1:20) {
        title <- names(my_flatten(x)[start+i])
        title <- str_remove(pattern = str_extract(pattern = "[^.]*.[^.]*",string = title),string = title)
        title <- substr(title,2,nchar(title))
        df_x <- as.data.frame(do.call(rbind, my_flatten(x)[seq(start+i,end,by = 20)]))
        df_y <- as.data.frame(do.call(rbind, my_flatten(y)[seq(start+i,end,by = 20)]))
        df_z <- as.data.frame(do.call(rbind, my_flatten(z)[seq(start+i,end,by = 20)]))
        
        
        colnames(df_x) <- 'dp_ML'
        colnames(df_y) <- 'dp_D_ML'
        colnames(df_z) <- 'dp_TP'
        quiet(df_dp <- melt(cbind(df_x,df_y,df_z)))
        g <- ggplot(data = df_dp) + 
            stat_density_ridges(mapping = aes(x = value,y = variable,fill = variable),color = "black",quantile_lines = T,quantiles = 2) +
            geom_vline(xintercept = 1,color = 'red') +
            rug_ +
            labs(title = title,y = "density", x = "dot products") +
            theme_ridge + guides(color = FALSE) + scale_y_discrete(expand = expand_scale(add = c(0.2, 2.5))) +
            scale_fill_discrete(name = "Estimator:",labels = c("Maximum Likelihood", "Diagonalised Maximum Likelihood", "Ledoit-Wolf Shrinkage"))
        dps[[i]] <- g
    }

    nplots = 20
    ncol = 5
    nrow = 4
    eval(parse(text = paste0("quiet(grob <- grid_arrange_shared_legend(", paste0("dps", "[[", c(1:nplots), "]]", sep = '', collapse = ','), ",ncol =", ncol, ",nrow =", nrow, ", position = 'bottom',  
                                                    top=grid::textGrob(paste('KDEs of Dot Products between Principal Eigenvector Estimate and True Eigenvector Estimate from ',covariance_structure, ' Covariance Structure'), gp=grid::gpar(fontsize=10)),bottom=grid::textGrob('Red lines indicate overlapping estimated and actual principal components',gp = gpar(fontface = 3, fontsize = 9),hjust = 1,x = 1),plot = T))", sep = '')))
    
    return(list('dp' = df_dp,'g' = grob))
}

Dotproduct Boxplots

#---- Q3 Plots ----

# boxplots
options(digits = 5, scipen = -2)
# As some of the kernel density estimates overlapped nearly perfectly, decided to use ridgeline plots to clearly show all three estimators
dp_diag_ml <- list('identity' = dot_products_diag_ml_identity,'linear' = dot_products_diag_ml_linear,'exponential' = dot_products_diag_ml_exponential,'toeplitz' = dot_products_diag_ml_toeplitz)
dp_ml <- list('identity' = dot_products_ml_identity,'linear' = dot_products_ml_linear,'exponential' = dot_products_ml_exponential,'toeplitz' = dot_products_ml_toeplitz)
dp_LedoitWolf <- list('identity' = dot_products_LedoitWolf_cov_estimates_identity,'linear' = dot_products_LedoitWolf_cov_estimates_linear,'exponential' = dot_products_LedoitWolf_cov_estimates_exponential,'toeplitz' = dot_products_LedoitWolf_cov_estimates_toeplitz)

plot_dps <- function(x,y,z,start,end,covariance_structure) {
    options(digits = 3, scipen = -2)
    # x is estimator list of eigens
    dps <- list()
    
    df_dp <- data.frame(matrix(NA, ncol=1, nrow=21)[-1]) # from SO
    
    s <- switch (covariance_structure,
                             "Identity" = 1,
                             "Linear" = 2,
                             "Exponential" = 3,
                             "Toeplitz" = 4
    )
    
    dps <- list()
    
    for (i in 1:20) {
        title <- names(my_flatten(x)[start+i])
        title <- str_remove(pattern = str_extract(pattern = "[^.]*.[^.]*",string = title),string = title)
        title <- substr(title,2,nchar(title))
        df_x <- as.data.frame(do.call(rbind, my_flatten(x)[seq(start+i,end,by = 20)]))
        df_y <- as.data.frame(do.call(rbind, my_flatten(y)[seq(start+i,end,by = 20)]))
        df_z <- as.data.frame(do.call(rbind, my_flatten(z)[seq(start+i,end,by = 20)]))
         
        
        colnames(df_x) <- 'dp_ML'
        colnames(df_y) <- 'dp_D_ML'
        colnames(df_z) <- 'dp_TP'
        quiet(df_dp <- melt(cbind(df_x,df_y,df_z)))
        g <- ggplot(data = df_dp) + 
            geom_boxplot(mapping = aes(y = value,x = variable,fill = variable),color = alpha("black", 0.1),outlier.shape = NA,alpha = 0.5) +
            geom_hline(yintercept = 1,color = 'red') +
            geom_jitter(mapping = aes(y = value,x = variable,fill = variable,color = variable),width = 0.4,size = 0.5, alpha = 0.8) +
            geom_rug(mapping = aes(x = value, colour = variable),show.legend = F,alpha = 1/2,sides = 'r') +
            labs(title = title,y = "dot products", x = "estimator") + guides(color = FALSE) +
            theme_light() + theme(axis.text.y = element_text(angle = 0, hjust = 1,  vjust = -0.5,size = 7),axis.text.x = element_blank(),axis.title = element_text(size = 7)) + theme(plot.title = element_text(size = 7)) + 
            scale_fill_discrete(name = "Estimator:",labels = c("Maximum Likelihood", "Diagonalised Maximum Likelihood", "Ledoit-Wolf Shrinkage")) + ylim(-1,1)
        dps[[i]] <- g
    }
    
    nplots = 20
    ncol = 5
    nrow = 4
    eval(parse(text = paste0("quiet(grob <- grid_arrange_shared_legend(", paste0("dps", "[[", c(1:nplots), "]]", sep = '', collapse = ','), ",ncol =", ncol, ",nrow =", nrow, ", position = 'bottom',  
                                                    top=grid::textGrob(paste('Boxplots of Dot Products between Principal Eigenvector Estimate and True Eigenvector Estimate from ',covariance_structure, ' Covariance Structure'), gp=grid::gpar(fontsize=10)),plot = T))", sep = '')))
    
    return(list('dp' = df_dp,'g' = grob))
}

Sum of Squared Errors

Here we take the sum of squared errors of \(\Sigma - \hat{\Sigma}\) which is \(trace[\Sigma - \hat{\Sigma})(\Sigma - \hat{\Sigma}]\). The larger the error, the greater the deviation of the estimated covariance matrix from the true covariance matrix. Clearly the lower bound for this measure is zero, with no upper bound.

 SSE_cov <- function(x,m) {
    return(tr((x-m) %*% (x-m)))
 }
 
 # ML
 identity_cov_flat <- my_flatten(ml_cov_estimates[[1]])
 identity_cov_struct_flat <- rep(my_flatten(cov_structures[[1]]),80)
 SSE_ml_identity <- map2(identity_cov_flat, identity_cov_struct_flat, SSE_cov)
 
 linear_cov_flat <- my_flatten(ml_cov_estimates[[2]])
 linear_cov_struct_flat <- rep(my_flatten(cov_structures[[2]]),80)
 SSE_ml_linear <- map2(identity_cov_flat, identity_cov_struct_flat, SSE_cov)
 
 exponential_cov_flat <- my_flatten(ml_cov_estimates[[3]])
 exponential_cov_struct_flat <- rep(my_flatten(cov_structures[[3]]),80)
 SSE_ml_exponential <- map2(exponential_cov_flat, exponential_cov_struct_flat, SSE_cov)
 
 toeplitz_cov_flat <- my_flatten(ml_cov_estimates[[4]])
 toeplitz_cov_struct_flat <- rep(my_flatten(cov_structures[[4]]),80)
 SSE_ml_toeplitz <- map2(toeplitz_cov_flat, toeplitz_cov_struct_flat, SSE_cov)
 
 # Diag ML
 identity_cov_flat <- my_flatten(diag_ml_cov_estimates[[1]])
 identity_cov_struct_flat <- rep(my_flatten(cov_structures[[1]]),80)
 SSE_diag_ml_identity <- map2(identity_cov_flat, identity_cov_struct_flat, SSE_cov)
 
 linear_cov_flat <- my_flatten(diag_ml_cov_estimates[[2]])
 linear_cov_struct_flat <- rep(my_flatten(cov_structures[[2]]),80)
 SSE_diag_ml_linear <- map2(identity_cov_flat, identity_cov_struct_flat, SSE_cov)
 
 exponential_cov_flat <- my_flatten(diag_ml_cov_estimates[[3]])
 exponential_cov_struct_flat <- rep(my_flatten(cov_structures[[3]]),80)
 SSE_diag_ml_exponential <- map2(exponential_cov_flat, exponential_cov_struct_flat, SSE_cov)
 
 toeplitz_cov_flat <- my_flatten(diag_ml_cov_estimates[[4]])
 toeplitz_cov_struct_flat <- rep(my_flatten(cov_structures[[4]]),80)
 SSE_diag_ml_toeplitz <- map2(toeplitz_cov_flat, toeplitz_cov_struct_flat, SSE_cov)
 
 # Ledoit-Wolf
 identity_cov_flat <- my_flatten(LedoitWolf_cov_estimates[[1]])
 identity_cov_struct_flat <- rep(my_flatten(cov_structures[[1]]),80)
 SSE_LedoitWolf_cov_estimates_identity <- map2(identity_cov_flat, identity_cov_struct_flat, SSE_cov)
 
 linear_cov_flat <- my_flatten(LedoitWolf_cov_estimates[[2]])
 linear_cov_struct_flat <- rep(my_flatten(cov_structures[[2]]),80)
 SSE_LedoitWolf_cov_estimates_linear <- map2(identity_cov_flat, identity_cov_struct_flat, SSE_cov)
 
 exponential_cov_flat <- my_flatten(LedoitWolf_cov_estimates[[3]])
 exponential_cov_struct_flat <- rep(my_flatten(cov_structures[[3]]),80)
 SSE_LedoitWolf_cov_estimates_exponential <- map2(exponential_cov_flat, exponential_cov_struct_flat, SSE_cov)
 
 toeplitz_cov_flat <- my_flatten(LedoitWolf_cov_estimates[[4]])
 toeplitz_cov_struct_flat <- rep(my_flatten(cov_structures[[4]]),80)
 SSE_LedoitWolf_cov_estimates_toeplitz <- map2(toeplitz_cov_flat, toeplitz_cov_struct_flat, SSE_cov)

SSE Density Plots

#---- Q4 Plots ----
 
# As some of the kernel density estimates overlapped nearly perfectly, decided to use ridgeline plots to clearly show all three estimators
sse_diag_ml <- list('identity' = SSE_diag_ml_identity,'linear' = SSE_diag_ml_linear,'exponential' = SSE_diag_ml_exponential,'toeplitz' = SSE_diag_ml_toeplitz)
sse_ml <- list('identity' = SSE_ml_identity,'linear' = SSE_ml_linear,'exponential' = SSE_ml_exponential,'toeplitz' = SSE_ml_toeplitz)
sse_LedoitWolf <- list('identity' = SSE_LedoitWolf_cov_estimates_identity,'linear' = SSE_LedoitWolf_cov_estimates_linear,'exponential' = SSE_LedoitWolf_cov_estimates_exponential,'toeplitz' = SSE_LedoitWolf_cov_estimates_toeplitz)

plot_sses <- function(x,y,z,start,end,covariance_structure) {
    options(digits = 3, scipen = -2)
    # x is estimator list of eigens
    sses <- list()
    
    df_sse <- data.frame(matrix(NA, ncol=1, nrow=21)[-1]) # from SO
    
    s <- switch (covariance_structure,
                             "Identity" = 1,
                             "Linear" = 2,
                             "Exponential" = 3,
                             "Toeplitz" = 4
    )
    
    sses <- list()
    
    for (i in 1:20) {
        title <- names(my_flatten(x)[start+i])
        title <- str_remove(pattern = str_extract(pattern = "[^.]*.[^.]*",string = title),string = title)
        title <- substr(title,2,nchar(title))
        #title <- substr(title,17,title) # trim off "identity.identity"
        df_x <- as.data.frame(do.call(rbind, my_flatten(x)[seq(start+i,end,by = 20)]))
        df_y <- as.data.frame(do.call(rbind, my_flatten(y)[seq(start+i,end,by = 20)]))
        df_z <- as.data.frame(do.call(rbind, my_flatten(z)[seq(start+i,end,by = 20)]))
        
        
        colnames(df_x) <- 'sse_ML'
        colnames(df_y) <- 'sse_D_ML'
        colnames(df_z) <- 'sse_TP'
        quiet(df_sse <- melt(cbind(df_x,df_y,df_z)))
        g <- ggplot(data = df_sse) + 
            #geom_density(mapping = aes(x = value,color = variable,fill = variable),alpha = 0.5) +
            #geom_density_ridges2(mapping = aes(x = value,y = variable,color = variable,fill = variable)) +
            ridges_+
            rug_ +
            labs(title = title,y = "density", x = "sse") +
            theme_ridge + guides(color = FALSE) + scale_y_discrete(expand = expand_scale(add = c(0.2, 2.5))) +
            scale_fill_discrete(name = "Estimator:",labels = c("Maximum Likelihood", "Diagonalised Maximum Likelihood", "Ledoit-Wolf Shrinkage"))
        sses[[i]] <- g
    }

    nplots = 20
    ncol = 5
    nrow = 4
    eval(parse(text = paste0("quiet(grob <- grid_arrange_shared_legend(", paste0("sses", "[[", c(1:nplots), "]]", sep = '', collapse = ','), ",ncol =", ncol, ",nrow =", nrow, ", position = 'bottom',  
                                                    top=grid::textGrob(paste('KDEs of Covariance Estimate SSE from',covariance_structure,' Covariance Structure'), gp=grid::gpar(fontsize=12)),plot = T))", sep = '')))
    
    #quiet(grob <- grid_arrange_shared_legend(sses[[1]],sses[[1]],top = "hello")) # test success
    
    #quiet(grob <- grid_arrange_shared_legend_plotlist(plotlist = sses,ncol = 5,top = "hello"))#,bottom = b_)) #grid.arrange(grobs = sses,ncol = 5) ,top = "test"
    
    return(list('sse' = df_sse,'g' = grob))
}

Estimate Stability

To assess the stability of the covariance matrix estimates, we calculate the variance of the sum of squared errors. As the range of these variances was very large, we take the log of them before plotting. We also display the raw error variances for reference.

Hence, the negative values (in blue) represent low error variances (less than 1), whilst positive values represent higher variances.

# Multiplots with heatmap turns out to be impossible with par() and grid.arrange() - have to drop down to very low level graphics or switch to ggplot2, so used SO code

error_variance <- function(x,matrix_out = T,plot = T, title = "") {
df_var <-data.frame(Parameters = names(unlist(x)), Errors = unname(unlist(x)))
if (matrix_out) { 
   m <- matrix(by(df_var$Errors,df_var$Parameters,var)[unique(names(unlist(x)))],nrow = 4,byrow = T)
   rownames(m) <- c("n=p","n=p log(p)","n=5p log(p)","n=10p log(p)")
   colnames(m) <- c("p=5","p=10","p=50","p=100","p=250")
  }
if(!matrix_out) { print(by(df_var$Errors,df_var$Parameters,var)[unique(names(unlist(x)))]) }

if (plot && matrix_out) {

}
  return('m' = m)
}

plot_heats <- function(list,title) {
arr <- list

grab_grob <- function(){
  grid.echo()
  grid.grab()
}

library(gplots)
gl <- lapply(1:4, function(i){
heatmap.2(log(arr[[i]]), Rowv=NULL,Colv=NULL,
          col = rev(rainbow(20*10, start = 0/6, end = 4/6)),
          scale="none",
          margins=c(4,4), # ("margin.Y", "margin.X")
          symkey=FALSE,
          symbreaks=FALSE,
          dendrogram='none',
          density.info='histogram',
          denscol="black",
          keysize=1,
          #( "bottom.margin", "left.margin", "top.margin", "left.margin" )
          key.par=list(mar=c(3.5,0,3,0)),
          # lmat -- added 2 lattice sections (5 and 6) for padding
          lmat=rbind(c(4, 4, 2), c(5, 1, 3)), lhei=c(2.5, 5), lwid=c(1, 10, 1),
          srtRow = 45,srtCol = 45,
          key.title = title[[i]],
          tracecol="#303030",
          trace = "both",
          cellnote = round(log(arr[[i]]),2),
          notecol = "black",
          cexRow = 0.8,
          cexCol = 0.8)
  #grab_grob()
})

#grid.newpage()
#grid.arrange(grobs=gl, ncol=2, clip=FALSE,plot = F,heights = c(5,5))
}

ML - Heatmaps (log)

Identity
p=5 p=10 p=50 p=100 p=250
n=p 10.00 0.62 0.11 0.06 0.03
n=p log(p) 1.13 0.13 0.01 0.00 0.00
n=5p log(p) 0.01 0.00 0.00 0.00 0.00
n=10p log(p) 0.00 0.00 0.00 0.00 0.00
Linear
p=5 p=10 p=50 p=100 p=250
n=p 10.00 0.62 0.11 0.06 0.03
n=p log(p) 1.13 0.13 0.01 0.00 0.00
n=5p log(p) 0.01 0.00 0.00 0.00 0.00
n=10p log(p) 0.00 0.00 0.00 0.00 0.00
Exponential
p=5 p=10 p=50 p=100 p=250
n=p 93.92 1572.93 19768.56 35312 67906.5
n=p log(p) 41.41 125.86 845.77 2739 8220.4
n=5p log(p) 1.21 6.80 32.45 153 613.6
n=10p log(p) 0.13 0.92 7.37 52 47.2
Toeplitz
p=5 p=10 p=50 p=100 p=250
n=p 20.03 1.34 0.24 0.1 0.06
n=p log(p) 2.13 0.15 0.02 0.0 0.00
n=5p log(p) 0.03 0.00 0.00 0.0 0.00
n=10p log(p) 0.01 0.00 0.00 0.0 0.00

Diag ML - Heatmaps (log)

Identity
p=5 p=10 p=50 p=100 p=250
n=p 38.64 11.38 16.36 13.44 8.38
n=p log(p) 9.79 2.31 0.45 0.37 0.19
n=5p log(p) 0.02 0.04 0.01 0.01 0.00
n=10p log(p) 0.01 0.01 0.00 0.00 0.00
Linear
p=5 p=10 p=50 p=100 p=250
n=p 38.64 11.38 16.36 13.44 8.38
n=p log(p) 9.79 2.31 0.45 0.37 0.19
n=5p log(p) 0.02 0.04 0.01 0.01 0.00
n=10p log(p) 0.01 0.01 0.00 0.00 0.00
Exponential
p=5 p=10 p=50 p=100 p=250
n=p 98.86 2039.79 19957.93 34820.7 67774.5
n=p log(p) 46.71 125.78 854.85 2725.6 8212.1
n=5p log(p) 1.19 6.84 32.55 153.0 613.0
n=10p log(p) 0.17 0.90 7.38 51.8 47.1
Toeplitz
p=5 p=10 p=50 p=100 p=250
n=p 237.34 13.91 26.33 35.39 12.43
n=p log(p) 7.70 2.25 0.96 0.80 0.39
n=5p log(p) 0.23 0.08 0.04 0.02 0.02
n=10p log(p) 0.13 0.01 0.02 0.01 0.01

Ledoit-Wolf - Heatmaps (log)

Identity
p=5 p=10 p=50 p=100 p=250
n=p 0.84 0.38 0.04 0.02 0
n=p log(p) 0.49 0.05 0.00 0.00 0
n=5p log(p) 0.01 0.00 0.00 0.00 0
n=10p log(p) 0.00 0.00 0.00 0.00 0
Linear
p=5 p=10 p=50 p=100 p=250
n=p 0.84 0.38 0.04 0.02 0
n=p log(p) 0.49 0.05 0.00 0.00 0
n=5p log(p) 0.01 0.00 0.00 0.00 0
n=10p log(p) 0.00 0.00 0.00 0.00 0
Exponential
p=5 p=10 p=50 p=100 p=250
n=p 25.04 498.72 39888.5 111844.73 239373
n=p log(p) 17.57 71.25 737.5 2704.62 16646
n=5p log(p) 1.13 13.79 90.2 100.54 137
n=10p log(p) 0.25 0.67 10.9 6.73 128
Toeplitz
p=5 p=10 p=50 p=100 p=250
n=p 1.65 1.09 1.84 1.03 1.45
n=p log(p) 2.67 0.46 0.26 0.20 0.13
n=5p log(p) 0.29 0.06 0.02 0.03 0.01
n=10p log(p) 0.03 0.01 0.00 0.00 0.00

Numerical Summaries

It’s relatively meaningless to plot whole data set for the eigenvalues and eigenvectors without a visual reference to the original covariance structure eigenvalues and eigenvectors. However, we can get a sense of the overall performance of the estimators by comparing them across all combinations of the covariance structures.

#---- Summary Plots ---- 
n_sort_list_summary <- function(x) { return(unlist(x)[naturalorder(names(unlist(x)))]) }
#unique(names(unlist(dot_products_LedoitWolf_cov_estimates_toeplitz)[naturalorder(names(unlist(dot_products_LedoitWolf_cov_estimates_toeplitz)))]))

# Q3
par(mfrow = c(3,4))
plot(n_sort_list_summary(dot_products_ml_identity),bg = 'red',main = "Identity - ML",pch = 21,ylab = "Dot Products - ML")
plot(n_sort_list_summary(dot_products_ml_linear),bg = 'blue',main = "Linear - ML",pch = 21,ylab = "Dot Products - ML")
plot(n_sort_list_summary(dot_products_ml_exponential),bg = 'purple',main = "Exponential - ML",pch = 21,ylab = "Dot Products - ML")
plot(n_sort_list_summary(dot_products_ml_toeplitz),bg = 'green',main = "Toeplitz - ML",pch = 21,ylab = "Dot Products - ML")
plot(n_sort_list_summary(dot_products_diag_ml_identity),bg = 'red',main = "Identity - Diag ML",pch = 21,ylab = "Dot Products - Diag ML")
plot(n_sort_list_summary(dot_products_diag_ml_linear),bg = 'blue',main = "Linear - Diag ML",pch = 21,ylab = "Dot Products - Diag ML")
plot(n_sort_list_summary(dot_products_diag_ml_exponential),bg = 'purple',main = "Exponential - Diag ML",pch = 21,ylab = "Dot Products - Diag ML")
plot(n_sort_list_summary(dot_products_diag_ml_toeplitz),bg = 'green',main = "Toeplitz - Diag ML",pch = 21,ylab = "Dot Products - Diag ML")
plot(n_sort_list_summary(dot_products_LedoitWolf_cov_estimates_identity),bg = 'red',main = "Identity - LW",pch = 21,ylab = "Dot Products - LW")
plot(n_sort_list_summary(dot_products_LedoitWolf_cov_estimates_linear),bg = 'blue',main = "Linear - LW",pch = 21,ylab = "Dot Products  - LW")
plot(n_sort_list_summary(dot_products_LedoitWolf_cov_estimates_exponential),bg = 'purple',main = "Exponential - LW",pch = 21,ylab = "Dot Products  - LW")
plot(n_sort_list_summary(dot_products_LedoitWolf_cov_estimates_toeplitz),bg = 'green',main = "Toeplitz - LW",pch = 21,ylab = "Dot Products  - LW")  

Numerical Summaries: Dot Products
Covariance Structure Min. 1st Qu. Median Mean 3rd Qu. Max.
Identity 2 2 1 2 1 2
Linear 2 2 1 3 3 2
Exponential 2 3 3 4 3 2
Toeplitz 1 1 2 1 2 1

Discussion

In terms of trace estimation, it’s possible not enough replications of each parameter combination (in our case, 20) were performed to differentiate the properties of the different estimators.

The maximum likelihood estimator to have a lot of error when the dimensionality of the true matrix is large, relative to the sample size, subsequently leading to error in optimization processes that rely on this estimates from this estimator. (Ledoit & Wolf, 2003). It is also not invertible when \(n \leqq p\). It has the advantage of being easy to compute and unbiased.

In contrast, the Ledoit-Wolf estimator handles estimation of high dimensional covariance matrices far better than maximum likelihood, especially at high values of p.

Conclusion

References

Primary Literature

Additional Measures of Dispersion. (n.d.). Retrieved from https://onlinecourses.science.psu.edu/stat505/book/export/html/645.

Bai, J., & Shi, S. (2011). Estimating High Dimensional Covariance Matrices and Its Applications. Columbia University. https://doi.org/10.7916/D8RJ4SGP

Coates, A., & Ng, A. Y. (2012). Learning Feature Representations with K-Means. In Lecture Notes in Computer Science (pp. 561-580). Springer Berlin Heidelberg. https://doi.org/10.1007/978-3-642-35289-8_30

Kessy, A., Lewin, A., & Strimmer, K. (2018). Optimal Whitening and Decorrelation. The American Statistician, 72(4), 309-314. https://doi.org/10.1080/00031305.2016.1277159

Ledoit, O., & Wolf, M. (2004). Honey, I Shrunk the Sample Covariance Matrix. The Journal of Portfolio Management, 30(4), 110–119. doi: 10.3905/jpm.2004.110

Ledoit, O., & Wolf, M. (2004). A well-conditioned estimator for large-dimensional covariance matrices. Journal of Multivariate Analysis, 88(2), 365-411. doi: 10.1016/s0047-259x(03)00096-4

Mukherjee, B. N., & Maiti, S. S. (1988). On some properties of positive definite toeplitz matrices and their possible applications. Linear Algebra and Its Applications, 102, 211–240. doi: 10.1016/0024-3795(88)90326-6

Pearson, K. (1901). LIII. On lines and planes of closest fit to systems of points in space. The London, Edinburgh, and Dublin Philosophical Magazine and Journal of Science, 2(11), 559-572. doi: 10.1080/14786440109462720

Rao, K. R., & Yip, P. C. (2001). The Karhunen-Lo?ve Transform in The transform and data compression handbook (pp. 20-55). Boca Raton, FL: CRC Press.

Reris, R., & Brooks, J. P. (2015). Principal Component Analysis and Optimization: A Tutorial. In Operations Research and Computing: Algorithms and Software for Analytics (pp. 212-225). INFORMS. https://doi.org/10.1287/ics.2015.0016

Serra, A., Coretto, P., Fratello, M., & Tagliaferri, R. (2017). Robust and sparse correlation matrix estimation for the analysis of high-dimensional genomics data. Bioinformatics, 34(4), 625-634. https://doi.org/10.1093/bioinformatics/btx642

Supplementary Material

[1] McIntosh L.,Hardcastle L. (2017). NBIO 228: Math Tools for Neuroscience, Linear Algebra: Part II [PowerPoint slides]. Retrieved from https://web.stanford.edu/class/nbio228-01/lectures/

[2] Department of Mathematics, Oregon State University (1996). Eigenvalues and Eigenvectors [webpage]. Retrieved from https://math.oregonstate.edu/home/programs/undergrad/CalculusQuestStudyGuides/vcalc/eigen/eigen.html